Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dataset collection endpoint, script to scrape validator db #78

Merged
merged 9 commits into from
Dec 4, 2024

Conversation

jarvis8x7b
Copy link
Member

@jarvis8x7b jarvis8x7b commented Nov 28, 2024

context:

  • each validator's DB may be super big, so jsonl dataset is chunked into 50mb parts before calling the validator API
  • added entrypoints/ folder for dataset service, unsure if we will use it for other things in the future. left one-off scripts/extract_dataset.py because it's a once in a while thing

@jarvis8x7b jarvis8x7b force-pushed the feature/dataset-collection branch 4 times, most recently from 79dd239 to a2c454c Compare December 2, 2024 13:00
@jarvis8x7b jarvis8x7b self-assigned this Dec 3, 2024
@jarvis8x7b jarvis8x7b marked this pull request as ready for review December 3, 2024 09:02
build: add deps for dataset service
build: add docker compose services & entrypoints
@jarvis8x7b jarvis8x7b force-pushed the feature/dataset-collection branch from 6c7bc5c to b6eb648 Compare December 3, 2024 09:05
Copy link
Collaborator

@karootplx karootplx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jarvis8x7b jarvis8x7b changed the title feature/dataset collection feat: add dataset collection endpoint, script to scrape validator db Dec 4, 2024
@jarvis8x7b jarvis8x7b merged commit 4369cf1 into dev Dec 4, 2024
1 check passed
Tedbees pushed a commit that referenced this pull request Dec 4, 2024
…78)

* feat: add validator api dataset service, extract dataset script

build: add deps for dataset service
build: add docker compose services & entrypoints

* chore: remove test script

* chore: fix comment

* chore: update readme and env example

* fix: add auth

* refactor: add hotkey to filename

* chore: format toml
codebender37 pushed a commit that referenced this pull request Dec 9, 2024
…78)

* feat: add validator api dataset service, extract dataset script

build: add deps for dataset service
build: add docker compose services & entrypoints

* chore: remove test script

* chore: fix comment

* chore: update readme and env example

* fix: add auth

* refactor: add hotkey to filename

* chore: format toml
karootplx pushed a commit that referenced this pull request Dec 9, 2024
…78)

* feat: add validator api dataset service, extract dataset script

build: add deps for dataset service
build: add docker compose services & entrypoints

* chore: remove test script

* chore: fix comment

* chore: update readme and env example

* fix: add auth

* refactor: add hotkey to filename

* chore: format toml
jarvis8x7b added a commit that referenced this pull request Dec 10, 2024
* feat: simulator

* chore: ruff lint

* fix: scoring interval

* refactor: migrate score data from db to .pt file

* fix: add VALIDATOR_MIN_STAKE in environment

* feat: added script to inspect score

* fix: fixed from PR feedback

* fix: fixed linter issue, and commitizen

* feat: add dataset collection endpoint, script to scrape validator db (#78)

* feat: add validator api dataset service, extract dataset script

build: add deps for dataset service
build: add docker compose services & entrypoints

* chore: remove test script

* chore: fix comment

* chore: update readme and env example

* fix: add auth

* refactor: add hotkey to filename

* chore: format toml

* fix: minersim task scoring, validatorsim  subtensor retry mechanism and dendrite forward timeout

* chore: ruff lint

* fix: ground truth arr ordering

* perf: increase timeout

* fix: negative stride error from scoring (#86)

* fix: add .copy() when creating tensors from np array

* chore: deleted duplicate logs

* fix: wandb now storing final scores correctly (#87)

* fix: wandb missing final scores (#88)

* fix: wandb now storing final scores correctly

* fix: cast scores from tensor.float to primitive float

* chore: commitizen ci actions (#92)

* ci: add conventional commits CI

* refactor: edits from pr feedback

* perf: add subtensor retries (#90)

* perf: add subtensor retries

* fix: add retries to sync function

* chore: updated pyproject for dev dependencies (#94)

- added tabulate, and termcolor in dev dependencies
- fixed insepect_score.py using wrong file name

* fix: use correct dimension when calculating mean scores

---------

Co-authored-by: tedbee <tedbee@tensorplex.ai>
Co-authored-by: karootplx <karoo@tensorplex.ai>
Co-authored-by: codebender <167290009+codebender37@users.noreply.github.com>
Co-authored-by: mediumsizeworkingdog <mediumsizeworkingdog@tensorplex.ai>
Tedbees pushed a commit that referenced this pull request Dec 10, 2024
## [1.5.0](v1.4.2...v1.5.0) (2024-12-10)

### Features

* add dataset collection endpoint, script to scrape validator db ([#78](#78)) ([1b4ef9e](1b4ef9e))
* added script to inspect score ([f9dd7e4](f9dd7e4))
* simulator ([ed110e3](ed110e3))

### Bug Fixes

* add VALIDATOR_MIN_STAKE in environment ([84774a4](84774a4))
* fixed from PR feedback ([23d085a](23d085a))
* fixed linter issue, and commitizen ([553ad66](553ad66))
* ground truth arr ordering ([75f008a](75f008a))
* minersim task scoring, validatorsim  subtensor retry mechanism and dendrite forward timeout ([956f5cf](956f5cf))
* negative stride error from scoring ([#86](#86)) ([50c86c6](50c86c6))
* scoring interval ([94e050b](94e050b))
* use correct dimension when calculating mean scores ([2f30931](2f30931))
* wandb missing final scores ([#88](#88)) ([0d8b93d](0d8b93d))
* wandb now storing final scores correctly ([#87](#87)) ([affcd3e](affcd3e))

### Performance Improvements

* add subtensor retries ([#90](#90)) ([ca07d46](ca07d46))
* increase timeout ([23669b7](23669b7))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants